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Abstract 

The centrality of assessment for facilitating thinking, reasoning, and problem solving is 
well-documented and indisputable. Less apparent is how to create informative, yet 
practical measures for classroom use. Clearly, the changing of assessments alone will not 
in and of itself improve learning; teachers' beliefs and pracfices will need to be altered 
with various levels of support. The design of assessment situations can nevertheless have 
a substantial impact on the quality of information provided to teachers and students for 
insfructional decision-making and meaningful learning. 

This report considers principles of informative assessments that improve teaching and 
learning by communicating learning goals, interpreting student performance, tracking 
progress over time, and suggesting appropriate corrective actions. In the report, we 
describe several properties of assessment design that enable teachers and students to 
describe progress in terms of cognitive features of performance, and then act on that 
information to improve learning. We review classroom assessment programs across 
subject matters and grade levels in order to suggest essential design elements for tasks, 
score forms, and interprefive materials that maximize the information provided by 
assessment of performance and compefence. These principles are nof intended to be 
comprehensive, but are meant to highlight some promising areas for informative 
assessment research. 



Introduction 

The centrality of assessment for facilitating thinking, reasoning, and problem 
solving is well-documented and indisputable (Black & Wiliam, 1998; Glaser & Silver, 
1994; National Research Council [NRC], 2001; Shepard, 2000a). Less apparent is how 
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to create informative, yet practical measures for classroom use. Clearly, fhe changing 
of assessments alone will not in and of itself improve learning; teachers' beliefs and 
practices will need to be altered with various levels of support (Borko, Mayfield, 
Marion, Flexer, & Cumbo, 1997; Shepard, 2000b). The design of assessment 
situations can nevertheless have a substantial impact on the quality of information 
provided to teachers and students for instructional decision-making and meaningful 
learning. 

This report considers principles of informative assessments that improve 
teaching and learning by communicating learning goals, interpreting student 
performance, fracking progress over time, and suggesting appropriate corrective 
actions. Presently, the drive to identify and develop such tools is more compelling 
than ever before. Demands for standards and accountability (No Child Left Behind 
Act of 2001) have brought assessment to the forefront of educational policy 
concerns. Renewed interest in uniting the fields of cognifive psychology and 
psychometrics has also generated conversation and commitment for fhe 
improvement of national, state, and classroom testing (Pellegrino, Baxter, & Glaser, 
1999; NRC, 2001). Together, these shifts in political and intellectual climates have 
created unprecedented opportunities for the exploration of informative assessment 
techniques. 

Our approach is grounded on several assumptions. We presume that 
informative assessment, like formative measures, can provide "a clear view of fhe 
learning goals, information about the present state of fhe learner, and action to close 
the gap" (NRC, 2001, p. 229). The term "formative," however, carries a chronological 
connotation and emphasizes the placement of assessments during the course of an 
instructional unit (Black & Wiliam, 1998; Bloom, Hastings, & Madaus, 1971). We use 
the label "informative" to draw attention to the instructional purpose for improving 
student learning (New South Wales Department of Education and Training 
(NSWDET), 1998). Further, assessments can be informative of various aspects of 
achievement for different audiences. All assessments are informative in some way; 
the key issues are of what, for whom, and how those measures inform. We concentrate 
on the information given to teachers and students to facilitate the teaching and 
learning of fhinking, reasoning, and problem solving as advocated by various sets of 
national standards (e.g.. National Council of Teachers of Mafhematics [NCTM], 
1995; NRC, 1996). 
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We also adhere to the general framework for assessmenf design puf forfh by 
fhe Commiffee on fhe Foundafions of Assessmenf (NRC, 2001). The commiffee 
idenfifies fhe fhree key elemenfs of assessmenf as: (a) cognition, or fheories abouf 
learning, performance, and fargefs for assessmenf; (b) observation, or fasks used fo 
elicif information abouf learning; and (c) interpretation, or mefhods for scoring and 
validating assessmenf resulfs. We examine how fhese elemenfs are presenfed fo 
feachers and sfudenfs in clear, coherenf, and insfrucfionally meaningful ways. 

The sections fo follow describe several properties of assessmenf design fhaf 
enable feachers and sfudenfs fo describe progress in ferms of cognitive feafures of 
performance, and fhen acf on fhaf information fo improve learning. We review 
classroom assessmenf programs across subjecf maffers and grade levels in order fo 
suggesf essential design elemenfs for fasks, score forms, and inferprefive maferials 
fhaf maximize fhe information provided by assessmenf of performance and 
compefence. These principles are nof infended fo be comprehensive, buf are meanf 
fo highlighf some promising areas for informative assessmenf research. 

Properties of Assessment to Inform Teachers 

A key aspecf of feaching has always been moniforing sfudenfs' progress. 
Teachers fradifionally do fhis by giving curriculum-based classroom fesfs and 
judging fhe number of correcf responses. Unforfunafely, fhis usual approach fo 
assessmenf offen does nof provide fhe information fhaf feachers could use in order 
fo improve sfudenf proficiency. In fhis reporf, we provide examples and insfances of 
approaches fo assessmenf fhaf can effectively elicif and display information abouf 
sfudenf achievemenf (i.e., nof simply fheir knowledge of subjecf maffer, buf fheir 
abilify fo use fhaf knowledge fo solve problems and reason abouf novel sif nations). 
We anficipafe changes in classrooms of fhe fufure will occur as assessmenfs of 
fhinking, reasoning, and problem solving are infegrafed wifh insfrucfion fo inform 
feaching and learning. 

The unique, fundamenfal nafure of informative assessmenf is ifs abilify fo 
prepare feachers for effective insfrucfional acfivify based on defailed knowledge of 
sfudenf accomplishmenf. The specification of cumulative objectives is a powerful 
componenf of good educational programs and, in fhis confexf, informative 
assessmenf supporfs fhe feachers' pedagogical skill and judgmenf. Taking full 
advanfage of informative assessmenf requires a particular use of adaptive feaching 
fechniques; in ofher words, information abouf fhe sfudenf' s learning process and fhe 
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nature of accomplishment sets teachers up for appropriate instructional 
modifications. To do so, assessment must effectively communicate instructional and 
curriculum changes that enable teachers to see the student's displayed performance 
as connected to details of classroom practice. 

We discuss four properties of informative assessment for teachers that display 
cognitive components of performance and suggest instructional interventions. First, 
we suggest that a cognitive language is essential for helping teachers interpret 
assessment results and promote complex problem solving. Second, we demonstrate 
how rubrics can distinguish and communicate various qualities of fhinking and 
reasoning. Third, we explore fhe importance of explicit relationships between 
assessments and instruction fhat reinforce consistent expectations for teaching and 
learning. Finally, we discuss how assessments can suggest guidelines for instruction. 

A Cognitive Language for Teachers 

During informafive assessment, descriptions of learning goals and 
performances are couched in a language compatible with teachers' practices, 
experiences, and beliefs. Teachers tend to communicate to one another in a 
"language of fhe particular" (Leinhardt, 1990), or of how to teach under specific 
circumstances. At the same time, there is a growing body of cognitive theory and its 
application that can inform insfructional situations (e.g.. Carver & Klahr, 2001; 
McGilly, 1994). Making this knowledge accessible to teachers is essential to the 
creation and dissemination of informative assessments. What is necessary is a 
language of learning and cognition for teacher practice that steers away from 
"molecular" learning fheory, and toward a level of discourse fhat relates learning 
processes to instructional performances. 

Consider Cognitively Cuided Instruction (CGI), a program that establishes a 
cognitive language through frameworks for fhe assessment of elementary grades' 
mathematical problem-solving (Carpenter, Fennema, & Franke, 1996; Carpenter, 
Fennema, Franke, Levi, & Empson, 1999). Their language consists of a taxonomy of 
common problem fypes and solution strategies. The researchers observed that across 
computational problems (i.e., addition /subtraction and multiplication /division), 
numerical literacy progresses from rigid, exclusive dependence on manipulatives 
(e.g., base-10 blocks) to more flexible, absfract mental representations of numbers. 
Moreover, students use different strategies based on two features of fhe question: (a) 
the operation to be performed between the two numbers (e.g., joining, separating). 
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and (b) the location of the unknown value within the problem (e.g., 2 + 3 = ?vs. 2 + ? 
= 5). The interaction of fhese two features sets up expectations for typical student 
behaviors and alternative problem-solving strategies. When teachers are able to 
classify problem fypes and solution strategies, they can encourage students to 
consider various ways to approach questions. 

CGI illustrates several key features of an informative, cognitive language for 
educators. Terminology (e.g., problem types and strategies) is derived from 
observable student practices and illustrated with multiple classroom examples 
(Carpenter et al., 1999). Further, the frameworks are extremely flexible in that they 
are not linked to specific curricula or prescribed teaching protocols, but instead give 
teachers leeway to select problems to assess understanding. Teachers can then 
modify instruction through the use of progressively simpler or more difficult 
problems, and the discussion and modeling of alternative strategies (Carpenter et 
al., 1996, 1999). In this sense, CGI frameworks provide a language that is highly 
adaptable; teachers are given tools for identifying student performance wifhout 
consf raining instrucfional practices.^ 

In ofher programs, metaphors provide a basis for dialogue between teachers 
and instructional designers (e.g., Martinez, Sauleda, & Huber, 2002). The Classroom 
Assessment as the Basis for Teacher Change (CATCH) project is dedicated to 
helping teachers improve their instructional practices by way of innovative, 
informative assessments (CATCH, 2002). Its keystone is a pyramid metaphor of 
problem types (Verhage & de Lange, 1997). In this scheme, tasks are fit into a three- 
tiered, three-dimensional pyramid. The rows, or tiers, represent increasingly 
complex levels of fhinking: (a) simple rote reproduction of facts and algorithms; (b) 
connections across mathematical problems or domains (e.g., generating 
combinations of numbers which, when added or subtracted, equal five); and (c) 
analyses of situations that require novel, self-generated models or solution 
strategies. The vertical dimension of fhe pyramid refers fo fhe various domains of 
mafhematics, such as geometry, algebra, and probability. Finally, the depth 
dimension describes relative problem difficulty. 

The pyramid metaphor communicates selected principles of assessment 
implementation. In particular, its shape (wide at the bottom, narrow at the top) 



1 The reader is encouraged to consult CGI's handbook (Carpenter et al., 1999) for extended 
examples of its principles in classroom practice. 
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illustrates the relative number of problems that should appear on a balanced test of 
mafhematics skills, wifh higher level tasks at the narrow top of fhe pyramid taking 
more time to complete and appearing less frequently than rote recall questions. 
Ideally, teachers should strive to fill fhe pyramid, evenfually testing their students in 
all content dimensions at all problem levels (Dekker, Querelle, & van den Boer, 
2000). Missing from fhis metaphor is a model of knowledge and skill development, a 
hallmark of CGI. It might be assumed that skills build upon one another and that 
students must possess a firm foundation in basic algorithms before proceeding to 
more complex problems, but this is not confirmed in project literature. Nevertheless, 
the CATCH project and assessment pyramid demonstrate the potential effectiveness 
of metaphors and concrete imagery for representing complex mathematic concepts 
for assessmenf and instruction. 

Rubrics Emphasizing Cognitive Elements of Performance 

Performance rubrics are convenient, readily available tools for characterizing 
student learning (Arter & McTighe, 2001; Luft, 1999; Shafer, Swanson, Bene, & 
Newberry, 2001). Rubrics can be defined as systems for rafing the quality of a 
particular assessment performance. Essential to a rubric is the notion of levels fhat 
can distinguish different quality performances (Arter & McTighe, 2001). When 
properly designed, they can distinguish and communicate various qualities of 
fhinking, reasoning, and problem solving. At issue is specifying fhe parameters or 
principles for exemplary, informative classroom rubrics. 

At their best, rubrics can make students' thinking explicit and highlight areas 
for growfh. Balanced Assessment for fhe Mafhematics Curriculum (1999) ranks tasks 
on a four level scale: "fhe student needs significant instruction," "the student needs 
some instruction," "the student's work needs to be revised," and "the student's 
work meets the essential demands of fhe task." Each task is accompanied with 
specific objectives, performance descriptions, and examples of each level. Teachers 
can fhen use fhis information to monitor students' performance under a variety of 
formal and informal assessment conditions (Balanced Assessment, 1999). 

In one of the elementary grades' tasks, for instance, students are asked to 
determine the ages of fhree dogs based on fheir combined ages and fhe ages of two 
of the dogs relative to the third (e.g., "Jason is 5 years older than Boy Blue," 
(Balanced Assessment, 1999, p. 166). Scores reflect the correctness of fhe response 
and attention to the problem details (i.e., difference between ages, combined ages of 
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the dogs). Students who need significant instruction failed to consider the relative 
ages of the dogs or their combined ages, while students who need to revise their 
work obtained the correct answer but did not explain how they did so. The rubrics 
point out specific errors in students' logical reasoning and in so doing, articulate 
criteria for improvement. 

Formulating effective rubrics demands an awareness of cognitive theory as 
realized in student performance (e.g., fhe developmental progression of sfudents' 
ideas of a particular concept or subject matter), combined with attention to context 
and detail. Among suggested guidelines for rubric development is that scores must 
emphasize quality of performance over quantity of information provided. In doing 
so, rubrics can challenge teachers' beliefs that performance can be defined 
exclusively by the number of details students include (Goldberg & Roswell, 1999). 
Tradeoffs must also be made between analytic and holistic rubrics (i.e., rubrics that 
analyze multiple aspects of performance versus overall quality), and generalized 
versus specific guidelines (i.e., using the same rubric for multiple assessment 
situations versus using a different rubric each time). While analytic and holistic 
rubrics are similar in technical qualities such as reliability (Arter & McTighe, 2001; 
Klein et al., 1998), analytic rubrics take longer to score yet are also judged more 
informative by teachers (Waltman, Kahn, & Koency, 1998). Likewise, generalized 
rubrics may be easier to learn because of repeated practice opportunities, but may 
leave valuable information that only specifically attuned guidelines can provide. 

Future consideration will need to be given to operationalizing the alignment of 
instructional situations and rubric complexity in order to optimize rubric use. If the 
teacher's goal is to obtain an overall impression of fhe class, for example, a holistic 
rubric may suffice. If, instead, that aim is the diagnosis of students' particular 
strengths and weaknesses, an analytic rubric may be more appropriate (Waltman et 
al., 1998). Conversely, attention to the individual details of a response may be 
misleading if fhe number of details is less important than the overall quality of 
information included (Arter & McTighe, 2001; Klein et al., 1998). Ultimately, 
discussions of rubrics' informational value are incomplete without concomitant 
analysis of the instructional context. 

Explicit, Consistent Relationships Between Assessments and Instruction 

Informative assessments are administered in a carefully planned sequence 
detailing students' past, present, and future goals and achievements. Assessments 



7 




are not conducted in isolation, but instead build upon each other toward an 
overarching purpose. Synchrony between assessments and instruction reinforces 
consistent expectations for teaching and learning. These relationships must be made 
explicit to teachers, however, if they are to influence classroom learning. Several 
examples of how these linkages are established and made apparent to teachers are 
reported here. 

KIDMAP software (NSWDET, 2001) demonstrates how technology can help 
teachers visualize relationships between assessments, and then craft individual 
learning trajectories for their students. The software functions as a comprehensive 
database containing national standards, performance rubrics, sample assessments 
and lesson plans, and comparison data from students across the country. This 
enables teachers to plan their instruction around predetermined standards by 
charting objectives, instruction, assessment, and available resources. The compact 
display of information facilitates its review and draws teachers' attention to lessons 
where they are not presently collecting a broad range of assessment evidence 
(NSWDET, 2001). On an individual student level, KIDMAP helps teachers record 
performance on specific syllabus outcomes or national indicators, and then create 
profiles over time and in comparison to other students. This in turn allows teachers 
to easily identify areas for individual and class improvement, with sample lesson 
plans and assessments providing options for follow-up activities (NSWDET, 2001). 

Assessments linked within units can also be beneficial. One approach exploits 
the potential of curriculum-embedded assessments to assess and develop skills 
needed for a summative, end-of-unit task. Consider Mystery Powders (Baxter, Elder, 
& Glaser, 1995; Baxter, Elder, & Shavelson, 1997; Baxter & Glaser, 1998), a series of 
assessments designed for a hands-on science unit of the same name in which 
elementary students study the reactions of five whife powders fo various indicators 
(e.g., water, vinegar, iodine, heat). In the end-of-unit assessment, students must 
identify the composition of six samples of white powder based on a list of five 
options. Students are given water, vinegar, iodine, and a hand lens, and may also 
consult their science laboratory notebooks for information on reactions between 
powders and indicators. To identify the powders, students must collect confirming 
and disconfirming evidence. Eor example, two of fhe powder options are cornstarch 
only, and cornstarch and baking soda. To identify fhe cornstarch only sample, 
students should test it with iodine (which turns purple in the presence of starch) to 
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confirm that it contains cornstarch, but should also use vinegar (which fizzes only 
with baking soda) to disconfirm, or rule out, the presence of baking soda. 

Data collected from the end-of-unit assessment revealed that students had 
difficulty gathering appropriate evidence to support their claims. Students rarely, if 
ever, used information to rule out powders. In response, two embedded assessments 
were developed to promote the skills students were lacking (Baxter et al., 1997). The 
tasks mirrored the general purpose and structure of the end-of-unit assessment in 
that they asked students to identify the composition of one or more white powder 
samples. They gave students practice in determining when they had gathered 
sufficient evidence to conclusively identify the samples, and alerted them to the 
need to rule out powders by a process of elimination. The associated score forms 
and score interpretation guidelines provided teachers with a means to understand 
the nature of evidence students used to identify the powders, and their ability to 
draw conclusions from their findings. This heightened awareness could form the 
basis of instructional interventions that reinforced the principle of adequate and 
definitive evidence. 

The Mystery Powders tasks demonstrate an important property of informative 
assessment development: that formative, or curriculum-embedded measurements 
are created after establishing an end-of-unit assessment or outcome. A more typical 
development routine is to insert assessments at regular intervals during instruction. 
This process tends to test the material that has just been covered without any regard 
to how the assessment relates to the rest of the unit. A more beneficial approach, 
illustrated in the Mystery Powders assessments and others (e.g., Roberts, Wilson, & 
Draney, 1997; Wiggins & McTighe, 1998), is to work backward from the end-of-unit 
assessment or desired learning results, so that all activities are directed toward a 
central outcome. In the process, this integrated system of assessments consistently 
reinforces unit goals and fosters the development of skills necessary to acquire 
proficiency. 

Interpretive Guidelines for Teaching 

Our exploration of the properties of informative assessment for teachers 
concludes with a consideration of the ways in which task feedback summarizes 
student performance and facilitates instructional changes. These guidelines can take 
on a variety of forms. This section highlights a few programs representative of the 
array of informative assessment designs. 
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Facets of Thinking (Minstrell, 2000, 2001) generates "prescriptions" for teachers 
and students to improve learning. Its foundation is a series of hierarchically- 
organized clusters of ideas, or facets, that students may hold about a particular 
phenomena or subject. Facets are ranked in order of fheir expecfed development, 
with lower numbers representing accepted conventions in that particular field, and 
higher values indicative of a less principled understanding (Minstrell, 2000, 2001). 

Students' facets are evaluated through, among other ways, DIAGNOSER, a 
web-based assessment (DIAGNOSER, 2002; Fiunt & Minstrell, 1994). The program 
presents students with a series of multiple choice questions asking to predict 
observations and justify fheir answers. Teachers can review fhe distribution of 
answers and decide how fo adapf instrucfion using "prescripfive acfivities" 
provided on DIAGNOSER's web site (DIAGNOSER, 2002). The ranking of fhe facets 
within a cluster (those ending in 0 or 1 being the most conceptually appropriate) 
allow teachers to further evaluate the level of students' understanding and the types 
of assumptions that must be corrected. 

Consider the assessment of students' understanding of fhe effects of gravity 
and media (e.g., air) on objects. Teachers may elicit students' prior knowledge 
through a series of "elicitation questions" suggested on the DIAGNOSER website 
(DIAGNOSER, 2002). One such question asks students to predict the weight of an 
object in space given that it weighs five pounds on Earth. Teachers may choose to 
correct misunderstandings with a class demonstration of fhe reading of a scale in an 
airless environment and other related activities, accompanied by the assignment of a 
DIAGNOSER problem set (DIAGNOSER, 2002; Minstrell, 2000). The web site 
provides follow-up activities for sfudents according facet codes for fheir assessment 
responses (DIAGNOSER, 2002). In time, students should be able to see that the 
principles they first applied to the elicitation questions do not hold up in reality. 

The Mystery Powders assessments (Baxter et al., 1995, 1997; Baxter & Glaser, 
1998), described in the previous section illustrate how an assessment system can 
orient teachers to unit goals and student performance in a way which shapes their 
teaching practices. Recall that the Mystery Powders assessment system consists of 
fhree performance assessments (two curriculum-embedded, one end-of-unit), and 
that all tasks are designed to draw awareness of students' generation and 
interpretation of necessary and sufficient evidence for identifying fhe composition of 
unknown white powders. Teachers are given five types of information to help them 
implement and score the assessments: (a) a sequence of assessment and instruction, 
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which outlines the unit lessons and points at which the performance assessments 
should be administered; (b) instructions for each task, including objectives, materials 
needed, time limits, grouping arrangements, and a script for giving students 
directions; (c) a score form, with blank spaces for checking students' performance (i.e., 
identification of powders, and supporting evidence) against the correct answers; (d) 
scoring instructions with examples; and (e) a score summary with instructions for 
assessment interpretation. The latter two documents are particularly worthy of 
discussion for fheir informational value. 

The scoring instructions begin by identifying fhe evidence used for scoring and 
fhe types of scores students will receive. In the Mystery Powders assessments, 
students receive separate scores for fhe identification of fhe powders and fhe 
evidence provided to support claims. Next follows a scoring rationale that explains 
that students need to gather appropriate combinations of confirming and 
disconfirming evidence to identify fhe powders. Teachers are taken step-by-step 
through the process of marking fhe score form and assigning points. Answers and 
evidence are scored on separate scales, with one point given for every correct 
combination of powders identified, and zero to four points given per powder for 
supplying complete and correct evidence. The score form includes two examples of 
evidence af each point level (i.e., 0 through 4) to clarify scoring criteria. 

The score summary allows teachers to examine a random sample of ten 
students in more depth to identify specific areas for insfructional change. The 
summary consists of a table where teachers list students' answer and evidence 
scores, along with the number of times they identified confirming, disconfirming, 
and irrelevant evidence. It also includes a series of questions to guide teachers' 
interpretation (e.g., "Look at Column III. How many students used confirming 
evidence?") This process allows feachers to break scores down into specific 
components to further direct their teaching efforts. For example, if most students 
used confirming evidence fo "rule in" powders but failed fo use disconfirming 
evidence to "rule out" alternative choices, the teacher should draw students' 
attention to the characteristics of necessary and sufficient powder identification 
evidence. 

Unlike Facets, the Mystery Powders score form and summary do not provide 
specific activities for remedying performance discrepancies. Rafher, instructional 
modifications are left to the teachers' discretion, and may be guided by district 
professional development opportunities for integrating assessment with instruction 
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(Baxter et al., 1997). The scoring system, as it presently stands, nevertheless provides 
teachers with the kinds of information necessary to alter and expand their practices. 



Properties of Assessment to Inform Students 

Informative assessment for fhe learner considers how sfudenfs can use 
assessmenf resulfs fo improve fheir performance in fhe course of insfrucfion. Two 
such fypes of information for fhe sfudenf are fhe sfeps or sfrafegies for a problem's 
solution, or fhe explanation of fhe principles underlying a correcf response. The 
approach fo informative assessmenf for learners fhaf we consider here is one fhaf 
makes learning goals and performance qualify explicif so fhaf learners can easily 
monifor and improve fheir fhinking and reasoning. 

This section reviews fhree design principles fhaf make assessmenfs informative 
fo sfudenfs, as a means of identifying some of fhe elemenfs of effective programs. 
Firsf, exemplary models of competence are described fhaf familiarize sfudenfs wifh fhe 
criferia for successful assessmenf performance. Second, fhe value of graphical tools to 
track progress is discussed wifh an emphasis on fhe principles underlying 
informative graphs. Third, fhe value of structured opportunities for reflection and 
revision is explored as a principle for maximizing fhe benefifs of self-assessmenf. 

Models of Competence 

Practical, clear grade-level appropriate standards are essential for improving 
sfudenf performance (Arfer & McTighe, 2001). While fhis idea is nof new (see fhe 
discussions of assessmenf fransparency in Frederiksen & Collins, 1989), if fakes an 
especially prominenf role in fhe confexf of informative assessmenf for sfudenfs. The 
programs described here illusfrafe various forms of classroom goals or sfandards, 
and fhe roles of feachers and peers in esfablishing performance expecfafions. 

One example of performance sfandards is ThinkerTools, which incorporafes 
peer and self-assessmenf info inquiry-based science insfrucfion (Whife, 1993; While 
& Frederiksen, 1998). As sfudenfs conducf investigations, fhey learn fo evaluafe fheir 
performance according fo several criferia, such as "Undersfanding fhe science," 
"Reasoning carefully," and "Writing and communicating well." Throughouf fhe 
various phases of inquiry (e.g., question, prediction, experimenf), sfudenfs apply af 
leasf fwo such criferia fo fheir work and ofhers'. These reflective experiences, in 
furn, give sfudenfs a language for discussing fheir fhinking and reasoning during 
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inquiry, leading to the identification and improvement of problem areas for 
learning. 

Ofher programs incorporafe more implicif feacher- and sfudenf-mediafed 
assessmenf fo model desired learning behavior. Fosfering Communities of Learners, 
for example, capifalizes on group acfivify fo elicif evidence of sfudenf fhinking and 
sfrengfhen comprehension and communication. In one such program, guided 
writing, sfudenfs compose written and illusfrafed research reporfs wifh help from a 
more knowledgeable experf (e.g., feacher, researcher, or older sfudenf). As sfudenfs 
work in small groups, fhe experf asks probing questions like "Do you fhink fhe 
reader will be able fo undersfand fhaf?" (Brown, Ellery, & Campione, 1998). 
Sfudenfs' activities, fhe experfs' assessmenf and feedback, and fhe sfudenfs' ensuing 
revision are immediafe and inferacfive, and direcfed fowards improved writing 
proficiency. 

In addition fo experf adulfs who are capable of modeling learning, sfudenfs are 
equally insfrumenfal in shaping fheir classmafes' undersfanding. This is evidenced 
by fhe role of reciprocal feaching (RT) (Palincsar & Brown, 1984) in FCL classrooms. 
Sfudenfs work in small groups fo undersfand information (e.g., arficle, video) 
perfinenf fo fheir research fopic. One sfudenf leads fhe discussion by summarizing 
fhe maferial, asking questions abouf if, or prompting for predictions abouf fufure 
research. The ofhers confribufe by clarifying fheir undersfanding and correcting 
comprehension problems (Brown & Campione, 1996; Brown ef al., 1998). 

Reciprocal feaching exercises help sfudenfs discern how well fhey undersfand 
maferial by virfue of fheir abilify fo ask and answer questions abouf if. Peers also 
provide feedback fo one anofher by reacting fo fhose questions and answers, and 
demonsfrafing how well fhey undersfood fhe maferial. In furn, fhe more compefenf 
sfudenfs model comprehension for fhose af a lower level of undersfanding (Brown 
ef al., 1998). Two relafed activities, jigsaw and crossfalk, have sfudenfs gafher and 
share research in order fo feach fheir peers. Quesfions during sessions direcf 
sfudenfs fo fhe informafion fhey musf obfain in order fo clarify fheir peers' 
undersfanding (Brown ef al., 1998). 

The effectiveness of models of compefence depends on fhe qualify of feacher 
and peer supporf, as demonsfrafed by fhe preceding examples. Sfudies of middle 
schoolers' creation of hypermedia documenfs (Erickson & Lehrer, 1998) furfher 
corroborafe fhis assumption. Over fhe course of fwo years, sfudenfs in an urban 
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middle school learned to design multimedia presentations and, in so doing, generate 
expansive research questions (i.e., topics that encouraged inquiry) and successfully 
communicate ideas. Throughout instruction, students evaluated peers' work and 
internalized "critical standards" for questions and project designs. Teachers 
scaffolded fhese discussions by elaborating on student ideas, identifying examples 
of exemplary work, and articulating performance criferia. Wifh fime and practice, 
students took more control over class discussions, appropriated the teachers' 
guidelines into their own discourse, and developed a peer critique sheet for 
assessing fheir hypermedia documents (Erickson & Lehrer, 1998). In this case, as in 
the ThinkerTools and Fostering Communities of Learners projects, assessment was 
made informative to students through the introduction of standards and the 
diminishment of teacher reinforcement. 

Graphical Tools to Track Progress 

Informative assessments for students incorporate methods for visualizing 
progress over time. Like teachers, students also need concise, comprehensible 
records of past and present accomplishments in order to set future learning goals. 
Some assessment programs address this need through student-friendly graphical 
displays. Such efforts must be tempered by research on students' difficulties with 
interpreting graphs and other inscriptions, as will be noted shortly. 

KIDMAP (NSWDET, 2001) and the Berkeley Evaluation and Assessment 
Research group (BEAR) (Roberts et al., 1997; Wilson, Draney, & Kennedy, 2001; 
Wilson & Sloane, 2000) graph students' achievement against subject-specific criteria 
and / or classroom norms for fhe purposes of reflection and review. These reports 
typically take the form of percentile rankings across students, or a chart or graph of 
performance on several variables rated on the same ordinal scale. The BEAR 
assessment system for a middle-school science curriculum, for example, rates 
students on a scale from zero to four on several inquiry variables such as designing 
and conducting investigations and understanding concepts. A student's 
performance is charted with respect to these variables accompanied with 
suggestions for improving performance (e.g., "Explain any unexpected results," 
"Think about other ways you could use the scientific information") (Wilson et al., 
2001). KIDMAP likewise displays performance on a five-point scale (from 
"Progressing Towards" to "Working Beyond") on selected objectives, supplemented 
with more detailed comments. The BEAR and KIDMAP reports can then be used in 
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conferences with the student and/ or parents to discuss progress and set appropriate 
learning goals (NSWDET, 2001; Roberts et al., 1997; Wilson & Sloane, 2000). 

Graphing student progress may well prove fruitful for optimizing assessment 
informativeness. The promise of graphing, however, must be tempered with 
evidence of students' problems in interpreting such representations (Bowen & Roth, 
2002; Leinhardt, Zaslavsky, & Stein, 1990; Roth & McGinn, 1998; Vekiri, 2002). 
Progress maps and other graphs could play a significant role toward the goal of 
fostering interpretative skills, but they must be designed properly. Students are 
most likely to understand graphs that require the least amount of cognitive 
processing, either because of a simplified spatial organization (i.e., all related 
information is grouped together), or auditory or written explanations identifying the 
graph's purpose and directing students' attention to key interpretive details (Vekiri, 
2002). BEAR and KIDMAP incorporate these principles by including concrete 
suggestions for future learning, and advocating discussions of graphed progress 
among teachers, students, and parents. If interpretation is properly scaffolded in 
manners like these, graphs can be extremely valuable tools for informative 
assessment among students. 

Structured Opportunities for Reflection and Revision 

Informative assessment is instrumental in the development of self-monitoring 
expertise and establishes routine opportunities to reflect on performance. Initially, 
learning involves a significant degree of external support that is controlled by the 
teacher or the tutor. As learning proceeds, there are increasing opportunities for the 
acquisition of self-regulatory skills, and the identification and discrimination of 
criteria for high levels of performance. As this stage progresses, the design of the 
instructional situation becomes increasingly under the control of the student as a 
developing expert. There is a selective use of external support with students 
observing the performance of other students and calling on the advice of the teacher 
only as needed. 

A number of features can be incorporated into assessment situations so that the 
student can work with others to observe performance and receive feedback, with the 
opportunity to refer to supporting materials or the instructor as necessary. This 
report has already discussed the roles of models of competence and graphs in 
facilitating self-assessment. Another critical feature we discuss here is the need for 
deliberate, structured opportunities to reflect on performance. 
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Consider the lessons to be learned from selected programs that promote self- 
assessment. One is that is that self-assessment must be accompanied with 
suggestions for improvemenf if if is fo facilifafe learning. ThinkerTools (Whife, 1993; 
Whife & Frederiksen, 1998), as nofed before, provides habifual opporfunifies for 
sfudenfs fo assess fheir progress on selecfed aspecfs of science inquiry. Class 
discussions, feacher feedback, and opporfunifies fo evaluafe anonymous reporfs 
ensure fhaf sfudenfs undersfand fhe assessmenf criferia and fheir appropriafe 
applicafion (Whife & Frederiksen, 1998). An equally imporfanf componenf of 
reflecfive assessmenf is informafion on how fo succeed on projecfs, as 
communicafed fhrough evaluation criferia, conversations wifh higher-performing 
peers, class discussions, and feacher feedback. If sfudenfs only evaluafe fheir 
performances wifhouf undersfanding how fo improve fhem, fhey run fhe risk of 
being demoralized by low scores and convinced fhaf fhey lack fhe abilify fo do well 
(Whife & Frederiksen, 1998). 

Suggestions for promoting reflection fhrough assessmenf musf be faken wifh 
some caution. Designing prompfs for productive reflection is a complex process fhaf 
musf fake learners' beliefs and abilities, and learning environmenf characferisfics 
info consideration. Programs such as ThinkerTools (Whife & Frederiksen, 1998) 
demonsfrafe fhe fypes of acfivifies fhaf help sfudenfs design and undersfand 
sfandards for performance, and fhen use fhose sfandards fo evaluafe fheir own 
learning. Af fhe same fime, if cannof be assumed fhaf all prompfs or reflecfive 
acfivifies will be equally effective for all sfudenfs. For example, giving middle-school 
sfudenfs specific reflecfive prompfs during inquiry-based science insfrucfion (e.g., 
"To do a good job on fhis projecf, we need fo ...") can acfually be defrimenfal fo 
sfudenfs who have moderafe frouble faking responsibilify for fheir own learning. 
Generic prompfs (e.g., "Righf now we're fhinking abouf ..."), on fhe ofher hand, 
seem fo allow fhose sfudenfs fhe opporfunify fo reflecf on ideas of fheir own 
choosing, which in furn leads fo beffer performance (Davis, 2003). This is nof fo 
suggesf fhaf generic prompfs are suifable for all classrooms, as fhey have primarily 
been examined in highly scaffolded compufer-supporfed learning environmenfs 
(Davis, 2003). 

Sfudies of individual differences in reflection suggesf fhaf fhere is no ideal 
femplafe for assessmenf design fhaf is going fo elicif productive moniforing and idea 
revision for all learners in all insfrucfional sifuafions. Rafher, self-assessmenf 
opporfunifies place exfra responsibilities on feachers fo monifor sfudenfs' 
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performance, identify ongoing difficulties (e.g., overestimation of abilities), and 
correct misunderstandings. Opportunities for reflection and revision may need to be 
tailored to the needs of individual learners, with future research focusing on how 
this might be accomplished (Davis, 2003). 

One way to accommodate individual differences in monitoring ability is 
through one-on-one conversations about student performance. Although these 
conversations are typically associated with private conferences about portfolios (e.g., 
Courtney & Abodeeb, 1999; Klimenkov & LaPick, 1996; Schipper & Rossi, 1997), 
they can also be conducted during class discussions of subject matter (Clarke, 
McCallum, & Lopez-Charles, 2001). Effective interactions relate students' self- 
evaluation to specific criteria, direct questions to particular students (e.g., asking for 
evaluations of certain criteria with which a student has had problems in the past), 
probe for details, provide feedback about the quality of self-evaluations, and 
generate goals and plans for future work (Courtney & Abodeeb, 1999; Clarke et al., 
2001). For instance, one second grade teacher worked with students to create three 
individualized learning goals, which she recorded on index cards and taped to 
students' desks as reminders (Courtney & Abodeeb, 1999). Because personal 
interactions permit the immediate recognition and response to student difficulties 
(Bell & Cowie, 2001; Cowie & Bell, 1999), they might be more amenable to 
promoting self-assessment than are structured, group-administered assessments 
alone. 



Conclusion 

Evaluation used to improve the course while it is still fluid, contributes more 
to the improvement of education than evaluation used to appraise a product 
already placed on the market . . . Hopefully, evaluation studies will go beyond 
reporting on this or that course and help us understand educational learning. 
Such insight will, in the end, contribute to the development of all courses 
rather than just the course under test. (Cronbach, 1963, p. 675). 



This quotation is a commentary on the progress that has been made in uniting 
instruction and assessment for the improvement of classroom learning. In the past 
four decades since Cronbach wrote about formative evaluation, a central assessment 
goal has remained the same: to develop optimally informative measures of 
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understanding that productively redirect teacher and student behavior. During that 
time, strides have been taken to achieve this goal, including a reconceptualization of 
what it means to "know" a discipline, the merger of psychomefrics wifh cognitive 
science, new observafional and sfafisfical models for documenting learning, and fhe 
emergence of models of exemplary assessmenf practice (NRC, 2001). Efforfs musf 
now also be made fo operationalize assessmenf developmenf in as practical, 
defailed, and comprehensive a manner as possible, until informative fools can 
become commonplace in classroom learning environmenfs. 

This reporf has idenfified several principles of assessmenf programs fhaf 
inform feachers and sfudenfs of performance on relevanf aspecfs of achievemenf 
and suggesf directions for fufure learning. Principles fell info fhree general 
cafegories: (a) descriptions of cognitive acfivify, (b) presenfafion of assessmenf goals 
and feedback, and (c) fransformafion of feedback info recommendafions for feaching 
and learning. Our efforfs, while highlighting a selection of desirable assessmenf 
properties, also refer fo one efforf fo "accumulafe, synfhesize, and disseminafe 
existing knowledge" (NRC, 2001, p. 299) for fhe purposes of articulating assessmenf 
developmenf, particularly wifh regards fo analyses fhaf "cuf across effective 
exemplars wifh fhe goal of identifying and clarifying fhe new science of assessmenf 
design" (NRC, 2001, p. 304). 

This paper has concenfrafed on fhe inferprefive elemenfs of assessmenf design 
and supporting maferials. Similar reporfs could discuss, for insfance, fhe qualifies of 
feacher feedback fhaf susfain learning (e.g.. Black, Harrison, Lee, Marshall, & 
Wiliam, 2002; Black & Wiliam, 1998; Suffolk Counfy Council, 2001), or fhe elemenfs 
of fask design in various fields (e.g., Baxfer & Glaser, 1998; Solano-Flores & 
Shavelson, 1997; Sugrue, 1995). Analyses of particular forms of assessmenfs such as 
performance assessmenfs or shorf-answer questions could be especially fruifful. 
Haladyna, Downing, & Rodriguez's (2002) faxonomy of multiple-choice guidelines, 
derived from a synfhesis of research sfudies and educational measuremenf 
fexfbooks, represenfs one approach wifh promising implications for fufure ifem 
creation guidelines. 

Informative assessmenf can also benefif from a design research approach 
(Brown, 1992; Edelson, 2002) whereby fask developmenf is sysfemafically 
documenfed, wifh resulfs generalized fo various classes of assessmenf confexfs (e.g., 
across subjecf maffers, unifs wifhin fhe same subjecf, age groups, assessmenf fypes, 
efc.). This information is a rare commodify; assessmenf programs are generally 
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presented as final products, with little discussion of the creation process, design 
evaluation and revision, and lessons learned along the way. If such details were 
made available for public consideration (e.g., taking the form of design cases; 
Edelson, 2002), fhey would make invaluable contributions to defining fhe "design 
space" (NRC, 2001), establishing the parameters and principles for fhe science and 
technology of informative classroom assessment design. 

A Working Language of Cognition for Educators 

In addition to research on the development of informative assessments, we 
urge that a language be developed that can serve as a source of ready 
communication between researchers and educators. Examples of such a language 
have already been presenfed in fhis paper in fhe discussion of assessmenf properties 
to inform teachers. More work is needed, however. This discourse would be a 
merger of learning concepts and cognitive processes that guide and develop 
practices in the classroom in terms of fhe feedback provided by informative 
assessment. The teacher would use this feedback fo influence sfudent performance 
in learning various subject matter. A general set of principles would be incorporated 
for pracfices involved in differenf subject matters, so a major recommendation is the 
development of language of learning and cognition for teacher practice toward a 
level of discourse fhat would relate learning processes to instructional performances. 
Consideration would need to be given to both general concepts cross-cutting 
domains and to specific kinds of performance required in a variety of subject 
matters and school situations. 

The effort to accomplish this objective would reference a consistent vocabulary 
that would consider the context and description of performance, as well as fhe 
processes of learning. Working back and forth between descriptions of instructional 
situations and learning terminology would enable the refinement and emergence of 
discourse conventions. The goal of this effort would be developed on a basis of a 
shared usage in a variefy of situations. Sources and test situations for fhis working 
language for informative assessment would involve an integration of disciplinary 
concepfs and learning theory, an analysis of fhe educational context, and 
descriptions of various levels of performance and developing proficiency. 
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A Classroom Environment of the Future 

Efforts to construct informative assessments will significantly impact classroom 
environments that promote active knowledge. In these classrooms, teachers and 
students model practices encouraged by cognitive and situative learning theories, 
such as attention to prior knowledge, concentration on learning with understanding, 
and establishment of a community of learners (NRC, 2000, 2001). Assessments have 
been and will continue to be an integral part of this process by providing 
information that facilitates the design of adaptive learning environments. To teach 
for understanding, teachers must receive insights about students' thinking, 
reasoning, and problem solving that they can use to modify subsequent learning 
opportunities. To learn with understanding, students must obtain feedback about 
their performance that helps them revise their ideas and evaluate future work. 

The classroom thus becomes an environment of interchange interspersed by 
periods of reflection and problem solving. The teachers' reflection comes about as a 
result of interpretation of student performance; the students' reflection is the result 
of their reactions to feedback presented over the course of learning. There are 
periods in which both teachers and students appear to come together in appreciation 
that their activity has been successful, with the teachers planning for next steps and 
the students anticipating the development of new knowledge. 

This report has identified principles of informative assessment development 
taken primarily from localized, researcher-supported programs. These practices can 
now be studied and expanded to the mainstream. Projects that identify development 
principles and disseminate those ideas to teachers will be particularly helpful in this 
regard. There is no question that effective, practical, informative assessments can 
make substantial inroads in improving classroom teaching and learning. The 
challenge that lies ahead is to capitalize on the principles underlying exemplary 
assessment programs and undertake development efforts to make informative 
assessment opportunities accessible to all. 
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